Skip to content

Conversation

@nguidotti
Copy link
Contributor

@nguidotti nguidotti commented Nov 6, 2025

This PR changes how the explored node counter is updated, such that it is only incremented after solving a node in the B&B.

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

Summary by CodeRabbit

  • Refactor
    • Improved progress reporting cadence and coordination in the branch-and-bound solver.
    • More accurate and consistent node-counting in progress displays.
    • Reduced noisy/duplicate log output and ensured coordinated reporting from solver start.

✏️ Tip: You can customize this high-level summary in your review settings.

@nguidotti nguidotti requested a review from a team as a code owner November 6, 2025 14:45
@nguidotti nguidotti requested review from akifcorduk, chris-maes and rg20 and removed request for rg20 November 6, 2025 14:45
@coderabbitai
Copy link

coderabbitai bot commented Nov 6, 2025

📝 Walkthrough

Walkthrough

Node-counting and progress-logging were refactored: per-step counter updates were moved to occur around cutoff handling, and a new atomic flag should_report_ was added to gate and coordinate when logging occurs. Logs now compute/display metrics using local counters, then reset logging state.

Changes

Cohort / File(s) Summary
Header member addition
cpp/src/dual_simplex/branch_and_bound.hpp
Added std::atomic<bool> should_report_ member (placed after solver status) for coordinated/report-gating behavior. No public API/signature changes.
Node counting & logging refactor
cpp/src/dual_simplex/branch_and_bound.cpp
Moved updates of nodes_explored, nodes_unexplored, and nodes_since_last_log to occur around cutoff handling instead of per-step; introduced should_report_ gating to control when verbose logging occurs; logging now computes gap/metrics from local counters, resets log state (nodes_since_last_log, last_log) and re-enables reporting; adjusted multiple paths (exploration_ramp_up, explore_subtree, solve initialization and ramp-by-thread) to use the new cadence and counter placement.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Pay attention to thread-safety and atomic semantics around should_report_.
  • Verify that counter increments/decrements remain consistent across all cutoffs and early-return paths.
  • Confirm logging state reset (nodes_since_last_log, last_log) and timing matches intended reporting cadence.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix explored nodes counter' directly matches the PR's core objective of correcting how the explored node counter is updated in the branch-and-bound process.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9734677 and d499bea.

📒 Files selected for processing (2)
  • cpp/src/dual_simplex/branch_and_bound.cpp (8 hunks)
  • cpp/src/dual_simplex/branch_and_bound.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/src/dual_simplex/branch_and_bound.hpp
  • cpp/src/dual_simplex/branch_and_bound.cpp
**/*.{h,hpp,py}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Verify C API does not break ABI stability (no struct layout changes, field reordering); maintain backward compatibility in Python and server APIs with deprecation warnings

Files:

  • cpp/src/dual_simplex/branch_and_bound.hpp
**/*.{cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Files:

  • cpp/src/dual_simplex/branch_and_bound.hpp
  • cpp/src/dual_simplex/branch_and_bound.cpp
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/src/dual_simplex/branch_and_bound.hpp
  • cpp/src/dual_simplex/branch_and_bound.cpp
🧠 Learnings (8)
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.hpp
  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.hpp
  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.hpp
  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.hpp
  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/src/dual_simplex/branch_and_bound.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
  • GitHub Check: checks / check-style
🔇 Additional comments (8)
cpp/src/dual_simplex/branch_and_bound.hpp (1)

162-163: LGTM: Coordinated logging gate added.

The atomic boolean should_report_ provides a centralized gate to coordinate logging cadence across parallel exploration paths, aligning with the updated counter tracking logic in the implementation file.

cpp/src/dual_simplex/branch_and_bound.cpp (7)

783-792: Correct cutoff handling: node not counted as explored.

The cutoff path now only decrements nodes_unexplored without incrementing nodes_explored, which is correct since the node is fathomed before solving. This aligns with the PR objective to increment the explored counter only after solving a node.


803-826: Logging gated by should_report_ with local counter snapshots.

The logging block uses should_report_.exchange(false) to serialize writes and displays local snapshots of counters. This ensures only one thread logs at a time while allowing other threads to continue updating global counters.


857-859: Core fix: counters updated after solving the node.

The explored and unexplored counters are now updated after solve_node returns (line 844-855), ensuring that nodes_explored is incremented only after the LP relaxation is actually solved. This is the primary fix for the PR.


920-924: Consistent cutoff handling in explore_subtree.

Consistent with exploration_ramp_up, the cutoff path only decrements nodes_unexplored without incrementing nodes_explored, correctly implementing the PR objective.


929-955: Single-threaded logging in explore_subtree via task_id check.

The logging block is protected by if (task_id == 0) ensuring only the master best-first thread logs in this phase, with local counter snapshots used for display. The barrier at line 1385 in solve() ensures no overlap with ramp-up phase logging.


982-984: Counters updated after solving in explore_subtree.

Consistent with exploration_ramp_up, the counters are updated after solve_node returns, correctly implementing the fix across both exploration phases.


1368-1368: Initialize should_report_ for startup logging.

Setting should_report_ to true enables immediate coordinated reporting when parallel exploration begins, allowing the first logging decision to proceed.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nguidotti nguidotti added bug Something isn't working non-breaking Introduces a non-breaking change mip labels Nov 6, 2025
@nguidotti nguidotti added this to the 25.12 milestone Nov 6, 2025
@nguidotti
Copy link
Contributor Author

/ok to test bd697c9

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/dual_simplex/branch_and_bound.cpp (1)

723-737: Use the current lower bound when logging

Here we print progress using user_lower = compute_user_objective(original_lp_, root_objective_). That root objective is the initial LP bound, so after the tree tightens the lower bound the reported gap never reflects it and the metric stays artificially wide. Please base user_lower (and the gap) on the current lower bound you already computed for this node.

diff --git a/cpp/src/dual_simplex/branch_and_bound.cpp b/cpp/src/dual_simplex/branch_and_bound.cpp
@@
-      f_t obj              = compute_user_objective(original_lp_, upper_bound);
-      f_t user_lower       = compute_user_objective(original_lp_, root_objective_);
+      f_t obj              = compute_user_objective(original_lp_, upper_bound);
+      f_t user_lower       = compute_user_objective(original_lp_, lower_bound);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc49f7a and 9734677.

📒 Files selected for processing (2)
  • cpp/src/dual_simplex/branch_and_bound.cpp (8 hunks)
  • cpp/src/dual_simplex/branch_and_bound.hpp (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cpp/src/dual_simplex/branch_and_bound.cpp (2)
cpp/src/dual_simplex/branch_and_bound.hpp (6)
  • node (89-94)
  • node (89-89)
  • node (96-103)
  • node (96-98)
  • search_tree (233-237)
  • search_tree (258-264)
cpp/src/dual_simplex/solve.cpp (6)
  • compute_user_objective (98-103)
  • compute_user_objective (98-98)
  • compute_user_objective (106-110)
  • compute_user_objective (106-106)
  • compute_user_objective (650-651)
  • compute_user_objective (653-653)

if (lower_bound > upper_bound || rel_gap < settings_.relative_mip_gap_tol) {
search_tree->graphviz_node(settings_.log, node, "cutoff", node->lower_bound);
search_tree->update_tree(node, node_status_t::FATHOMED);
++stats_.nodes_explored;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit strange. I think of nodes explored as the number of nodes we have solved, or perhaps proved infeasible through node presolve. If we are fathoming the node here I would not count it as explored.

search_tree->update_tree(node, node_status_t::FATHOMED);
++stats_.nodes_explored;
--stats_.nodes_unexplored;
++stats_.nodes_since_last_log;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. The purpose of nodes since last log is for us to print every so often in a deterministic way. Here we did no work, so I don't think we want to increase this counter.

if (stats_.nodes_explored.load() == nodes_explored) {
stats_.nodes_since_last_log = 0;
stats_.last_log = tic();
bool should_report = should_report_.exchange(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have member variable to decide if we should log? I think this is increasing the complexity of the code for little return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this piece of code is running by multiple threads, the atomic here prevent multiple reports from happening (potentially out-of-order). But I agree that it is an imperfect solution. I will probably change when the ramp-up phase is reworked or the report is moved to a separated thread.

}
}

++stats_.nodes_explored;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is there a reason why all of these are ++x instead x++?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we incrementing the nodes explored here at all?

Copy link
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not count a node as explored if we trivially fathom it.

Let's talk offline about why you increment the nodes explored at the end of explore_subtree.

@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@rgsl888prabhu rgsl888prabhu changed the base branch from main to release/25.12 November 17, 2025 21:35
@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@chris-maes
Copy link
Contributor

@nguidotti do you want this to go into the 25.12 release?

@nguidotti
Copy link
Contributor Author

nguidotti commented Nov 25, 2025

@nguidotti do you want this to go into the 25.12 release?

Since it is quite straightforward change, I think we can target the 25.12 release. I already update the code with your suggestions and now I am checking if it is working on the square41 and other MIPLIB instances.

@nguidotti
Copy link
Contributor Author

Can you review again @chris-maes?

Copy link
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix.

@nguidotti
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 51a6b35 into NVIDIA:release/25.12 Nov 26, 2025
355 of 363 checks passed
@nguidotti nguidotti deleted the fix-explored-counter branch November 26, 2025 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working mip non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants